A method and arrangement for speech recognition wherein a volume distance .s 
determined between recognized words and the pauses lying between them. When the 
volume distance of a word is lower than a predetermined threshold, the word is 
evaluated as being incorrectly recognized, such that errors caused by unwanted noises 
are avoided. - - 

REMARKS 

A substitute specification is provided herewith which makes editorial changes in 
3 order to conform to standard US practice. A marked-up copy of the specification is also 
2 provided reflecting the changes made. 

5 in addition, the claims as filed have been cancelled and replaced by new claims 

1 that more clearly set forth the subject matter of Applicants' invention. 

y 

=3 No new matter has been inserted in the application. 

2 Applicants submit that this application is in proper condition for examination, 

□ which action is respectfully requested. 

Respectfully submitted. 
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M6We^V«*^««A'*6««««il- ■ METHOD AND SYSTEM, FOR 

SPEECH RECOGNITION 

IBACKGRQUNDOFIH^^ 

Field of the Invention 

,„ genera,, the, present invention is directed to (an^ 
en^ee^^ (speec. recognition systems. 
,„ partiouur. the present .nvention is airected ,o automatic detection o, 
speecil recognition errors. 
Discussion of tlie Related Art] 

Methods .or automatic speech recognKion are [often, utilised in 
speech recognition systems. Applications o, speech recognition systems are, 
,or example, dictating systems or automaticaiiy operating telephone 
exchanges. 

^ l,t is especially critical in speech recognition that the 
correct expressions of the correct speaKer are recognized. This is 


problematical Insofar as an ambient noise in which clear speech 
constituents are contained can be interpreted such by a speech 
recognition system as though they derived from the speaker of the 
speech actually to be recognized. In order to prevent a mix-up. a 
method is herewith disclosed for distinguishing the correct form the 
incorrect spoken language. In particular, the level of the speaker whose 
speech is to be recognized is usually clearly higher than speech from 
the unwanted noise, which usually comes from the background. The 
volume level of the speaker whose speech is to be recognized can thus 
be used to distinguish this from the background noise. 

Given] previously known methods for the automatic recognition of 
speech recognition errors are frequently caused by unwanted noises. A 
distinction is made between two types of unwanted noises, namely the 
speech of another speaker that is in fact usually correctly recognized but that 
is not to be assigned to the voice signal of the actual speaker and a 
(baekgmt^ [back-ground] noise not representing a voice signal such as. 
{fef^^eampte;) breathing sounds, that is incorrectly recognized as speech. { 
}The unwanted noises represent a considerable source of error in the 
automatic recognition of speech. 

In order to avoid such errors, speech recognition systems are 
trained to the speech of the individual speakers, so that the speech 
(.ec ognit ien) [recogni-tion] system can determine whether the acoustic 
signal derives from the speaker or is a background noise. Speech recognition 


systems having frequently changing speakers cannot be trained for every 
individual speaker. Given a speech recognition system integrated in a 
telephone system, thus, it is impossible to carry out a training phases lasting 
a number of minutes for every caller before the caller can speak his message, 
which often lasts only a fraction of a minute. 

[<;IIMMARY OF TH E INVENTION 
It is an] {A d y/d i.l d yc ouGly, th o ] object of the present invention {ts-te 
eftabte-8h)Ito provide a method for enabling] recognition of speech wherein 
recognition errors produced by unwanted noises are reduced. 

{This o bject io a Ui ie^cd acco i di n y l u the fcd l ur c a of the 
i ii dep ui d c nt pale i il U ai i Ho. D c vol upn i uilo u f the i ii . cnliuii a Lu J L riv c fr o m 
the dependent claimo. 

r ui d Ui i c .ing ll ic iiNu nli on, a n i c lh u d f or rGC uyniJiiy opcc ci. \ ^ o pcc i fiod 
w lieie i n ^. o rdo a nd p au oOo i n tho ^p cooli a iu jL t c rm i n c d] [It is another 
object of the invention to provide a method for determining words and 
pauses ]on the basis of word boundaries. 

[It is a further object of the invention to provide a method 
wherein an average silence volume can be determined during speech 
pauses. 

It is an additional object of the invention to provide a method 
to determine average word volume. 


It is yet another object of the invention to provide a method to 
determine a difference between the average word volume and the 
average silence volume. 

It is yet a further object of the invention to provide a method 
wherein speech is recognized when a difference between an average 
word volume and an average silence volume is greater than a 
predetermined threshold. 

These and other objects of the invention will become apparent 
upon careful review of the following disclosure, which is to be read in 
conjunction with review of the accompanying drawing figures. 

RRIFF DESCRIPTION OF THE DR AWINGS 
Figure 1 shows a flowchart according to the present invention; 
Figure 2 shows a part of a signal segment according to the 

present invention; 

Figure 3 shows a circuit diagram of a telecommunication 

system according to the present invention. 

nPT AILED DESCRIPTION OF THE P RFFERRED EMBODIMENTS 
Fig. 1 schematically shows a method for the automatic 
recognition of speech. This method is realized in practice by a 
computer program that works on a computer or a processing unit 
comprising an input for a voice signal. 
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The method is started in Step S1. In the following Step S2, a 
word of a speech signal S is analyzed. This analysis continues such 
that the acoustic voice signal which is usually present as a signal 
converted into an electrical signal, is segmented into words and pauses 
and the words are converted into text. The segmentation of the signal 
section is sometimes referred to as the Viterbi alignment method.] 

An average silence volume (Si level) is determined during the 
pauses. An average word volume (Wo level) for the words is also 
{d e termined) [deter-mined]. Further, a difference (A) between the average 
word volume (Wo level) and the average silence volume (Si level) is also 
determined. Speech is recognized when the difference (A) between the 
average word volume (Wo level) and the average silence volume (Si level) is 
greater than a {pred e term i ned} [predetermin-ed] threshold (S). Otherwise, a 
recognition of speech is not carried out in this range. 

The difference A forms a volume distance between the spoken 
words and the noises in the pauses. When the volume distance of a 
{recogn i zed} [recogn-ized] word is too slight, it is interpreted as an incorrectly 
recognized word. A determination is thus made as to whether a word has a 
predetermined volume distance from the remaining noise level. {The fact is 
thereby ut ili zed that bac l <ground} [Background] noises that often lead to 
incorrect recognitions in traditional methods for automatic speech recognition 
are not as loud as the words spoken by the speaker. These background 
noises can simply be filtered out with {the-}[using the method of the 


present] invention, regardless of whether they contain v^ords or are noises 

that do not represent a voice signal. 

The {i n centive) method [of the present invention] can also be 

^^^^^^^^^Hn^^^^f^^ [used such] that only the average volume 

need be determined over parts of the speech signal segment to be analyzed. 
{W hd l i^ u n d ers tood ^ v o Iu i k c i n Ilia o cn^c u f ll .u i n v e nt i on i s} 

[The term volume refers to] any physical quantity that is {appfoximately) 
[approxima-tely] proportional to the physical volume measured in decibels. 
Proportional quantities [relating] thereto are the energy of the acoustic signal 
or. {respectively;) [respecti-vely,] of ^ [an] electrical signal {andr^n 
p a K i e ular. the o tee fi o al guantitico t hereof} such as, for example, {the} voltage 
or the current. 

{II i ^ copeoi jlly criti uc il i n spee ch re ooy niU u n lli ul th e co rr ect 

expressten^^f^^^offeet-^^ 
insefat^a^^n^mbtenHio^^ 

eatr^e-mtefpfeted^tteh-br^^^ 
defived^fonrth^^peakef^f^he^^ 

preven^^HT^x^upT^-.^^ 
fomrth^tncoffeet-^pekerHan^^ 

whese^peeeMs^e-^^feeegni^ed^*-^^ 
the^^m^^arted^^eis^^ 

tevef^Hh^speaker^A^hese^peeeh^^^ c un thuo be ^^sed^c 

d iJa l i ny uiJi 11 l i^ from tfie ba ckyiu und noig n 
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p i e U ctGrmi i icd IhrGoli o ld (G); 
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-8- 


n y . 1 ^GhGmali c JIIyJ^howoa ii ifelli ucJ forthe du lo matic rGC uyii U iun of apccch. 
T li ii> method i& l edlLccl in pra ol i o c b y a com p ulei p iug ram that worUo on a 
cuinij uter or a p i u cGaaing un i l o omprioi ii y an input foi d voice signal. 
T l ie method u i, ic &pGctivoly, l li c G orrGopo ii d iri g program \6 f l a iled in Step S1. 
il l 11 le fo l low i ng Glc p 02, a vv uid o f a opoG Ui si g na l G io a ii d l yz c d. This 
d ii d l yais ensu e s i n a notor i ouJy known way, >/>/ li cr c by the d c ouat i c voico 
bi gn al which i s usually prGSC ii l as a uignal ouii ^ ci tGd i nto a n e lcoliic al signal, 
it, segm e nted i ii t u ^>/ ui ds and p au^uo and the ^^ uid i > are co ii v cilc d i nto text. 
Th e segmentdli uu u f the oigridl S bollu n c naue^. f o r exampl e , d ooo rd i ng to th e 
Vit e rb i ali gnm ent method. 

}Fig. 2 shows a diagram {ll i dl s I iowg a part) of a signal segment S in a 
coordinate system. In this coordinate system, ^ [a] time t is entered on 
the abscissa and the volume is entered on the ordinate. The volume is 
recited as logarithm of the energy E of the signal S.{ 
W h at is unde i ^ luud ao volume in the oens e u f tl i e i nvent iui i i s any physica l 
qud ii t i ty that is d p p io ximate l y piu port i onal t u U ig p hysica l v u lu iii c m c aour c d in 
dec i bels.} Quantities proportional to this are, in addition to the energy of the 
signal [segment] S, the electrical quantities of the acoustic signal converted 
into an electrical signal such as, { fui eAampIc, the] voltage {©Hhe) [or] 
current. 

In the segmentation of the signal {seetton) [segment] S, points in 
time t1 , t2 are defined that respectively define a boundary between a pause P 
and a word W. {In the illustrat e d u xcmpla i y emb o d i ment, a) [A] pause is 
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present between the point-in-time zero and t1 or, {respectively, fol l owing} 
[follow-ing] point-in-time t2 and the signal S represents a word between the 
points-in-time t1 and t2. 

An average silence volume Si level is determined in Step S3. The 
average silence volume Si level is the chronological average of the volume of 
one or more pause segments P. 

in Step S4, an average word volume Wo level defined. fFhe 
average} [Average] word volume Wo level is the chronological average of the 
volume of an individual word segment W. I..e., a separate Wo level is 
calculated for each individual word. 

In the following Step S5, a difference A is calculated between the 
average word volume Wo level and the average silence volume Si level: 

A = Wo-Level - Si-Level 
Subsequently, an interrogation is carried out in Step S6 to see whether the 
difference A is lower than a threshold SW. The threshold SW represents 
"volume distance" (also see Fig. 2). 

When this increase shows that the difference A is smaller than the 
threshold SW, then this means that the volume distance between the average 
word volume Wo level and the average silence volume Si level is less than 
the predetermined threshold SW. The word whose volume distance between 
the average volume level Wo level and the average silence volume Si level is 
lower than the predetermined threshold SW is evaluated as having been 
incorrectly recognized, since the inventors of the present invention have 
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found that the unwanted noises are usually not as loud as the word signals to 
be evaluated or that, given a constant unwanted noise (noise in the line, loud 
background noise) where in no satisfactory speech recognition is possible, 
the volume distance between the average word volume and the average 
silence volume is extremely slight. When the acquired signal is converted 
into a text in both instances, it merely always results in an incorrect 
recognition. When the inquiry in Step S6 yields that the difference A is lower 
than the threshold SW, {then) the program execution is branched to the Step 
S7 wherein an error elimination is implemented, this being explained later. 

Subsequently, a check is carried out in Step S8 to see whether a 
further word is to be evaluated. When the result in Step S6 is that the 
difference A is greater than the threshold SW, the program execution is 
directly branched onto an inquiry in Step S8. 

A check is carried out with the inquiry in Step S8 to see whether a 
further word is yet to be analyzed and to be interpreted and, if the result in 
"yes", the program execution is branched back onto the Step S2; othenwise, 
the program is ended with Step S9. 

{ I n the above"described exemp l ary embodiment, th e acqu i red} 
[Acquired] words are individually analyzed, converted into text and 
interpreted. This method is referred to as pace-keeping recognition. It is 
thereby expedient that the difference A between the average word volume 
Wo level of a word W and the average silence volume Si level o of the 
immediately preceding pause P is formed. However, it is also possible to 
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employ the average silence volume of the pause following the word W or to 
employ a silence volume averaged over the preceding or the following pause. 

Instead of a pace-keeping recognition, a recognition combining 
several words can also be employed. A complete sentence is thereby usually 
respectively to be registered as signal segment and to be then analyzed of a 
piece (sentence-by-sentence recognition). Given such a 
sentence-by- {sentenc c} [sent-ence] recognition, the silence volume can be 
averaged over all pauses P, whereby {, howev e r.) the average word volume is 
to be individually determined for each word W, so that the individual words 
can be evaluated as correctly or incorrectly recognized. 

Dependent on the application, there are various versions in the 
error elimination in Step S7 which can be utilized individually or in 
{ uu mb i nat i on. Accord i ng t u U i c f i rot version, words] [combina-tion. 

Words] that have been evaluated as incorrectly recognized are not 
taken into consideration in the conversion into a text or{, lespcctivcly.) are 
removed {th e refrom) [there-from]. 

According to {the} [a] second version of error elimination, a 
corresponding message is output to the user given a word deemed incorrectly 
recognized. The message can be output as an acoustic message (for 
example, "the last word was not correctly understood") or can be displayed as 
a graphic display. The former is expedient for speech recognition systems 
without display such as, for example, telecommunication system with 
automatic speech { r ecognition} [recognl-tion] and the second can be 
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meanir 


ingful. for example, given a dictating system. In dictating systems, a 
predetermined error character can be inserted at the corresponding location 
in the text as a graphic presentation, the user being prompted therewith to 
speak the word again, this then being automatically introduced at the location 
of the error character in the text. When the user does not wish to insert a 
word for this, he can actuate a correspondingly delete function for illuminating 

the error character. 

According to a third version of the error illumination, the user can 
be prompted by a corresponding message to speak louder, so that the 
required volume distance is achieved. As a result thereof, an adaptation of 
the voice input to the acoustic conditions (noise level by the speaker) or, 
respectively, the conditions of the transmission (noise on the line) of the 
acoustic signal ensues. When a repeated prompt to speak louder does not 
lead to an improved recognition result, the user can also be prompted to 
create different acoustic conditions or, respectively, transmission conditions in 
that, for example, the user is requested to telephone from a different 
telephone set if the user is connected to the speech recognition system via a 
telephone. 

According to a fourth version of the error elimination given a 
plurality of words successively evaluated as incorrectly recognized, this is 
evaluated as inadequate quality of the speech input and is indicated to the 
user with a corresponding message. 
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According to a fifth version of the error elimination, the words of 
what are referred to as n-best lists are individually interpreted. Often, a 
number of words that sounds similar can be allocated to a signal sequence. 
These words form the n-best lists. Since the boundaries between the pauses 
and the respective word given the individual words of the n-best list differ, 
average word volumes and, accordingly, different differences A can be 
determined for the individual words of the n-best list. 

The selection of the word of the n-best list that is inserted into the 
text ensues according to known match criteria, whereby the difference A can 
be inventively employed as an additional match criterion, whereby the word 
having the greatest difference A is inserted into the text. This fourth version 
of the error elimination forms an independent idea of the invention that can 
also be utilized in the automatic evaluation of n-best lists independently of the 
above-described method. 

{ I n on e embod i ment of the i nv e ntion, the threshold SW is constant. 
}[The threshold SW is constant.] However, it is also possible to 
automatically adapt the threshold SW to the acoustic conditions and to the 
signal transmission conditions. When there are excellent acoustic conditions 
and signal transmission conditions, then high differences A are usually 
achieved, these being significantly higher then constant thresholds that must 
be suitable for different applications and conditions. In such a case, it is then 
expedient when the threshold is adapted to the higher differences A. Thus, 
for example, a global difference Ag1 can be calculated between the average 
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word volume of a plurality of acquired words and the average silence volume 
of a plurality of acquired pauses, and this global difference Ag1 can be 
employed as threshold SW. either directly or after the subtraction of a 
predetermined . constant amount. This is particularly ^advantagee«3} 
[useful] in combination with the first version of the error elimination since 
background noises can also be filtered out as a result thereof, these being 
only slightly softer than the average word volume. The result thereof is that, 
given a speech input with high quality, the threshold below which the signals 
are evaluated as incorrectly recognized words is set higher than given a 
speech input with poorer quality. Preferably, a lower limit is provided for the 
threshold, so that this cannot be reduced to zero. 

The height of the variable threshold can also be evaluated as 
quality factor of the speech input. When the variable threshold reaches its 
lower limit, then this means that the quality of the speech input is relatively 
poor, which can be correspondingly communicated to the user. 

In the calculation of the global difference, all pauses and words that 
are spoken during a conversation with the speech recognition system are 
preferably taken into consideration. 

Fig. 3 shows an exemplary embodiment of an apparatus for 
speech recognition. This apparatus is a telecommunication system 1 that is 
connected to the telephone network via a network line 2. The 
^.eteeemmmt^ [telecommun-icatlon] system 1 comprises a subscriber 
access control 3 with which {teiephone} [tele-phone] subscribers calling from 


the outside can be connected via an internal bus 4, a digital-to-audio 
processor 5 and local telephone lines 6 to a telephone terminal 7 or, 
respectively, to the user using the telephone terminal. The internal bus 4 is 
connected to an announcement unit 8 and to a voice unit 9. Announcements 
can be introduced onto the bus 4 and, thus, onto the telephone lines 2, 6 with 
the announcement unit 8. The telecommunication system is controlled by a 
microprocessor 10 that is connected to the digital-to-audio processor 5, to the 
announcement unit 8 and to the voice unit 9. 

The voice unit 9 is composed of a speech analysis module 1 1 . a 
volume measuring means 12 and a voice control 13. 

The speech analysis module 1 1 carries out the analysis of the 
voice signal, whereby the voice signal is segmented into pauses and words 
and the words are converted into text. The speech analysis module conducts' 
the individual parts (words W and pauses P) of the speech signal S to the 
volume measuring means 12 and fonwards the converted text to the voice 
control 13. The volume measuring means determines the average volume 
(Wo level. Si level) of the individual parts of the speech signal and forwards 
the corresponding values to the speech control 13. A check is carried out in 
the speech control 13 to see whether the individual words have been correctly 
recognized (Step S6 in Fig. 1), whereby filtering incorrectly recognized words 
is potentially undertaken in the speech control 13 (first version of the error 
elimination). 
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The filtered or unfiltered text is forwarded from the speech control 
1 3 together with further data needed for the error elimination to the 
microprocessor 10 that evaluates the received text and the corresponding 
data. 

One function of the microprocessor 10 is to automatically connect 
the incoming calls to the respective telephone terminals 7. This ensues by 
interpreting the text received from the speech control 13 and by enabling the 
respective output of the digital-to-audio processor 5. 

When the received text cannot be interpreted or when an error 
elimination with announcements (second, third or fourth version) is 
necessary, then the announcement unit 8 is driven by the microprocessor to 
implement the corresponding announcement. 

An automatic switching is thus integrated into the inventive 
telecommunication system, this being capable of automatically forwarding 
incoming telephone calls to the respective telephone terminals. 

The { i nventive) telecommunication system 1 also makes it possible 
that the users of the telephone terminals 7 control the telecommunication 
system 1 with their voice and, for example, speak the number to be selected 
instead of typing it on the keys. 

All of these functions assume an optimally error-free speech 
recognition. As a result of the invention, errors due to background noises, 
whether as a result of a speech signal in the background or a noise that does 
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not represent a speech signal, can be avoided significantly better and in a 
simpler way than given traditional speech recognition systems. 
(Patent Claims 
Ivi e lli u d f ur apcoch recognition, whereby 
d) w o rds and p dua c G in the s p GCch arc d e l e i mined on t h e b aaia of word 
boundaries ; 

b ) dn averag e silen c e volu m e (Gi leve l ) du iiny the pau& e i. \6 determ i ned; 

c) a n average word volum e (Wo l evel) f ui Uie words is determ i ned; 

d ) a differenc e (A) is dete i mi n ed betwee n Ll i e averag e w uid vo \ omQ (Wo 
l e \/e l ) and I I ic average s i lence vo l u m e (G i l evel); 

e) whereby s pe eo l i is reco gii i / ed when I I le difference (A) b etween the 
av e rage wor d v ol ume (Wo le vt^ l ) and the dvcragc oi l cri o e v o lu m e (Oi l eve l ) i o 
g r e ater than a pred et ermined th r eshold (G); 

f) othenA/ i &e. no recognilion of the s p eech io i mplement e d. 

2. Method a uuu id i ng to claim 1, whereby l l i e average silen o c v ol ume and the 
a verage wo id v o lume i o ii i tj dsured as luy ar i thm via I I le a o guii e d energy. 

3. M e thod ac Loid i ng lu cla i m 1 u i 2, v>/hcrcby l l i o gl o bal diffe ibnc o b e tw e en 
llie averag e w ui d volume o f a plural i ty of se g ment e d w u ids an d the averag e 
si le nce volume of a p l urali l y of segm oi iled pauoeo i s o a lc^ ulat c d. and the 
ll ir e shold i s def i ned o n L l ic baoio of l l ie g l oba l difference. 

4. Method a ccor ding to clai m 3. whereby ll i e ll i reoho l d is equat e d with th e 
gl oba l d i fference : 
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G. Method according to c l aim 3, whereby the g l oba l difference is dim i n i shed 
by a predeterm i ned, constant amount and the vo l ume amount der i v i ng 
therefrom is emp l oyed as thresho l d. 

G . Method accord i ng to cla i m 1 or 2, whereby a constant threshold is 
emp l oyed. 

7. Method accord i ng to one of the cla i ms 1 through G , whereby a word for 
wh i ch no speech recogn i t i on i s i mp l emented is not taken i nto further 
consideration. 

'■a 

=0 Q. Method accord i ng to one of the c l aims 1 through 7, whereby a message i s 

;:P output to a us e r when no speech recogn i t i on i s i mp l ement e d. 

: ^ 
sa 

in 9. Method according to c l aim 0. whereby th e user i s prompted with the 

message to speak l ouder and/or to r e peat th e incorrect l y recognized word. 
m 10. Method accord i ng to c l a i m 0 , wh e reby the user i s prompted w i th the 

^5 message to speak l ouder, so that an adequate d i stance is ach i eved between 

'=3 th e average word volume and the average si l ence vo l ume. 

1 1 . Method accord i ng to one of the prec e ding claims, whereby the average 
s i l e nce volume is r e spect i vely determ i ned for an indiv i dua l pause and the 
d i fferenc e (A) is determined between the average word vo l ume (Wo l evel) of 
the spoken word and the average si l ence vo l ume (G i leve l ) of th e i mmediate l y 
preced i ng pause or the immed i ate l y fo ll ow i ng pause. 

12. M e thod accord i ng to one of th e preced i ng c l a i ms, whereby the average 
s i l e nce vo l ume i s av e rage over a plura l ity of success i ve paus e s and this 
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^,e,„9^, >. e . . lu iM. L-iMi^lu, i nn t in n oHh^^Hfefef^ 

teM (.uin U.L n bcM KsH^^tetefnnneU a..u. Jmy m II 

1 4. A ii a iigun. nt (.i ..pu.Ui i..ua "Hi " n. romprW"r«^»»"' "'"""""^ 

coi rfigur e U buch that 

bu undaric Gt 

fe^^n-„vei ^ il e i...c .olum. (G i d ur i ng thr t»a»s«^i^^«tem»«^ 

l e»el) a n O tl i e a» i.i a gi- ai le " ^ '- ■ "I " "' " C^ ' 

p ie Je lu i mi iic cI throbh o ld (S); 
Abstract 

Me ll i uU a n J A i M iiy i t f n r n p eb ch R o c egnttion 
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In vent i vely, a vo l umG distan c e is dctGr iii i ii Gd betw een the recogn i zed words 
and the pau&e& lying between them. W h en the volume d i atance of a word is 
l u w e rthan a predetermi n e d threshold, then the w uid i:> eva l uated as 
in correctly ree uy ri i zed. A& a result th ei eof, errors odUi.cd by unwant e d no i se s 
ar e avoided i n a simple way. 

f i gure 2) [Although modifications and changes may be suggested by 
those skilled in the art to which this invention pertains, it is the intention 
of the inventors to embody within the patent warranted hereon all 
changes and modifications that may reasonably and properly come 
within the scope of this invention. - -] 
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